Mining Gene Expression Data

نویسنده

  • Veronica Liesaputra
چکیده

Micro-array data has enabled us to obtain an overview of the cell by measuring the expression levels of thousands of genes simultaneously. It has also opened the possibility for predicting possible cancerous cells accurately. Since these data sets are very large, we need to utilize machine learning techniques to analyse them efficiently. However, micro-array data also poses challenging problems for machine learning algorithms. Previously, analysis on micro-array data was inferred simply from the micro-array data itself and ignored all the information surrounding it. Neglecting information is however contrary to standard machine learning wisdom. Therefore, the main goal of this work was to investigate the hypothesis that prediction could be improved by utilizing additional information. To enable linking of the micro-array data with its additional information in WEKA, a new space-efficient linked instance representation had to be invented. This new representation enabled effective performance comparisons of various machine learning algorithms on three different sets of gene expression data with a varying amount of additional linked-in information. Generally, we found SMO to be the single best classifier and voting of the top three classifiers usually improved performance over the single best classifier. Globally, best results were usually achieved using the linked-in additional information, thus confirming our original hypothesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Acid Mine Drainage Generation Potential of A Copper Mine Tailings Using Gene Expression Programming-A Case Study

This work presents a quantitative predicting likely acid mine drainage (AMD) generation process throughout tailing particles resulting from the Sarcheshmeh copper mine in the south of Iran. Indeed, four predictive relationships for the remaining pyrite fraction, remaining chalcopyrite fraction, sulfate concentration, and pH have been suggested by applying the gene expression programming (GEP) a...

متن کامل

Prediction of Blasting Cost in Limestone Mines Using Gene Expression Programming Model and Artificial Neural Networks

The use of blasting cost (BC) prediction to achieve optimal fragmentation is necessary in order to control the adverse consequences of blasting such as fly rock, ground vibration, and air blast in open-pit mines. In this research work, BC is predicted through collecting 146 blasting data from six limestone mines in Iran using the artificial neural networks (ANNs), gene expression programming (G...

متن کامل

Forecasting copper price using gene expression programming

Forecasting the prices of metals is important in many aspects of economics. Metal prices are also vital variables in financial models for revenue evaluation, which forms the basis of an effective payment regime using resource policymakers. According to the severe changes of the metal prices in the recent years, the classic estimation methods cannot correctly estimate the volatility. In order to...

متن کامل

Evaluation of PRR11 gene expression changes and its relationship with tumor size in patients with gastric adenocarcinoma

Introduction: Gastric cancer is one of the most common gastrointestinal tract neoplasms. Because of its invasion, and nonspecific symptoms and signs, the disease is often diagnosed at an advanced stage with short survival. PRR11 participates in the initiation and progression of lung cancer and breast cancer by regulating important genes involved in cell cycles and tumorigenesis. In this researc...

متن کامل

Evaluation of the Prognostic Value and TRIP13 gene Expression in Gastric Cancer

Introduction: Gastric cancer is a major public health issue worldwide. The factors that initiate cancer are not well understood, however aberrant expression of genes is associated with this cancer. TRIP13 plays pivotal roles in meiotic recombination, DNA repair, and cell cycle progression. An increasing body of evidence suggests that TRIP13 may possess functions other than meiosis and mitosis, ...

متن کامل

Mining and Analysing Spatio-Temporal Patterns of Gene Expression in an Integrative Database Framework

Mining patterns of gene expression provides a crucial approach in discovering knowledge such as finding genetic networks that underpin the embryonic development. Analysis of mining results and evaluation of their relevance in the domain remains a major concern. In this paper we describe our explorative studies in support of solutions to facilitate the analysis and interpretation of mining resul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005